First time here? Check out the FAQ!×
The 2012 Bioinformatics Survey

8

2

I'd like to visualise the results of a BLAST search in a genome browser. Is there a simple way to get the results in GFF format without having to write a parser myself?

asked Mar 12 2010 at 17:04
Michael Barton
1,224214

4 Answers

4

Start with tabulated blast output myfile.blast.out. Then check two-liners from:

http://bergman-lab.blogspot.com/2009/12/ncbi-blast-tabular-output-format-fields.html

Few lines tooutput proper gff are missing, but you may either go for minimalistic gff or try to encode everything in column 9. Also you may try validating your gff3 here:

http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online

NOTE: The blog linked above does not seem to exist anymore, here is the content of it from the wayback machine:

NCBI Blast Tabular output format fields

Certainly, with the new NCBI Blast+ tools, you won't need this anymore, but as long as we are sticking with the old blastall programm with its horrible documentation, I keep forgetting the format of the BLAST tabular reports. Tabular format is created when you specify "-m 8". This is the most useful format for parsing blast yourself without having to learn strange libraries like BioPerl, BioJava, BioPython or BioErlang (doesn't this exist yet, Mike?)

So here is the meaning of the fields:

queryId, subjectId, percIdentity, alnLength, mismatchCount, 
gapOpenCount
, queryStart, queryEnd, subjectStart, subjectEnd, eVal, bitScore

Parsing is then simple

Python:

for line in open("myfile.blast"):
   
(queryId, subjectId, percIdentity, alnLength, mismatchCount,
    gapOpenCount
, queryStart, queryEnd, subjectStart,
    subjectEnd
, eVal, bitScore)= line.split("\t")

Perl:

while(<>){
   
($queryId, $subjectId, $percIdentity, $alnLength, $mismatchCount,
   $gapOpenCount
, $queryStart, $queryEnd, $subjectStart,
   $subjectEnd
, $eVal, $bitScore)= split(/\t/)
}
answered Mar 12 2010 at 18:12
darked89
2,848212
Using the examples this was relatively simple to do. The GFF3 validator helped too. – Michael Barton Mar 12 2010 at 21:39
2 
The blog post referred to above has moved to: bergmanlab.smith.man.ac.uk/?p=41 – Casey Bergman Sep 24 2010 at 20:03
3

I found this via google: http://jperl.googlecode.com/svn-history/r16/trunk/Blast2Gff.pl

else I would save my blast result as XML and transform it to GFF with with a (should be) simple XSLT stylesheet. As an example, you can have a look at my 'old' stylesheet blast2svg: http://code.google.com/p/lindenb/source/browse/trunk/src/xsl/blast2svg.xsl Pierre

answered Mar 12 2010 at 17:18
Pierre Lindenbaum♦♦
29.9k12462
Thanks I'll give it a go. I've never used perl before though. :S – Michael Barton Mar 12 2010 at 17:28
I just got a bunch of errors with this. :( "Use of uninitialized value $QEnd in numeric lt (<) at ./Blast2Gff.pl line 161, <BLASTIN> line 12." – Michael Barton Mar 12 2010 at 17:43
Sorry I'm not a perl guru too :-P – Pierre Lindenbaum Mar 12 2010 at 17:59
The blast2Gff.pl looks as if it uses Blast tabular format, did you try this? – Michael Dondrup Mar 12 2010 at 18:59
I was using the tab delimited output produced using the -m8 option but I was still getting the error unfortunately. – Michael Barton Mar 12 2010 at 21:40
show 2 more comments
3

http://jperl.googlecode.com/svn-history/r16/trunk/Blast2Gff.pl

You can use the script Pierre found swith a slight modification, actually it is a bit crude and does no real error checking but it works. The error is it does not work if the blast file has a header like this:

# BLASTN 2.2.21 [Jun-14-2009]
# Query: 16383 sequences
# Database: genomedata/GenomeOfDeath.fas
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
GeneOfDeath  GenomeOfDeath  100.00  295     0       0       1       295     152626  152920  4e-168   585

So, one should filter out lines beginning with "#" and it does no harm to skip lines which are empty or contain only white spaces.

So edit the file Blast2Gff.pl: in line 149 add:

 nextif(/^\#/||/^\s*$/);# filter comments and empty lines

Such that this part looks like below, then try again.

while(<BLASTIN>)
{

 
nextif(/^\#/||/^\s*$/);# filter comments and empty lines

$HitNum
++;

my($QryId, $SubId, $PID, $Len,
    $MisMatch
, $GapOpen,
    $QStart
,$QEnd, $SStart, $SEnd,
    $EVal
, $BitScore)= split(/\t/);
answered Mar 13 2010 at 20:46
Michael Dondrup
9,3891725
1

Have you tried these scripts: http://gmod.org/wiki/Load_BLAST_Into_Chado, http://www.bioperl.org/pipermail/bioperl-l/2002-November/010223.html ??

maybe the PSL format is better to represent an alignment. You can also look at the BED format so later you can play with BedTools

answered Mar 12 2010 at 17:29
Giovanni M Dall'Olio
8,64011535
I don't have bioperl on my computer and the documentation states that installing bioperl is "non-trivial" which is not encouraging. Perhaps it would be simpler just to write a parser myself. – Michael Barton Mar 12 2010 at 17:42

Your Answer

Welcome to BioStar!

http://www.biostars.org

Questions and answers on bioinformatics, computational genomics and systems biology.

tagged

 × 216
 × 83
 × 20

asked

1 year ago

viewed

2,844 times

latest activity

1 year ago


You can help too!
Spread the word!

Tell a colleague about BioStar!

http://www.biostars.org

Subscribe: RSS feed

Admin group: Biostar Central

Contact info for the site administrator • For general feedback, suggestions or to report a problem use the Biostar Central group.